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Ground state and glass transition of the RNA secondary structure 
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Abstract. RNA molecules form a sequence-specific self-pairing pattern at low temperatures. We analyze 
this problem using a random pairing energy model as well as a random sequence model that includes 
a base stacking energy in favor of helix propagation. The free energy cost for separating a chain into 
two equal halves offers a quantitative measure of sequence specific pairing. In the low temperature glass 
phase, this quantity grows quadratically with the logarithm of the chain length, but it switches to a linear 
behavior of entropic origin in the high temperature molten phase. Transition between the two phases is 
continuous, with characteristics that resemble those of a disordered elastic manifold in two dimensions. For 
designed sequences, however, a power-law distribution of pairing energies on a coarse-grained level may be 
more appropriate. Extreme value statistics arguments then predict a power-law growth of the free energy 
cost to break a chain, in agreement with numerical simulations. Interestingly, the distribution of pairing 
distances in the ground state secondary structure follows a remarkable power-law with an exponent —4/3, 
independent of the specific assumptions for the base pairing energies. 



I 



PACS. 87.14.Gg DNA, RNA 
transitions 



87.15.-v Biomolecules: structure and physical properties - 64.70.Pf Glass 



1 Introduction 

The three-dimensional structure (i.e. conformation) of biomo 
is a fascinating topic due to its fundamental importance 
in modern biology p. Link between the structure of a 
biopolymer and its sequence information, however, remains 
at an empirical level due to the hitherto unyielding com- 
putational complexity in predicting the shape of a hetero- 
geneous polymer 2 3 4 5 6. At the heart of the problem 
is the lack of a general understanding on the energetics 
of a collapsed polymer in the presence of sequence-specific 
contact energies. Such a situation has been compared with 
the low temperature behavior of the spin glass model [718] . 
although the chain constraint and the unknown nature of 
sequence specificity may invalidate the analogy. 

In the present paper, we focus on the secondary struc- 
ture of RNA molecules [9T10lllli2li3j . RNA, like DNA, is 
a long chain molecule made of four different types of nu- 
cleotides: adenine (A), uracil (U), guanine (G) and cyto- 
sine (C). Under normal physiological conditions, an RNA 
molecule folds into a relatively compact shape which can 
be loosely described as a mixture of double-stranded he- 
lical segments (known as stems) and occasional single- 
stranded bulges and hairpins with tertiary contacts. The 
helical segments are stabilized by base-pairing and base 
stacking which represent dominant contributions to the 
energy of a folded structure. Unlike a ds-DNA molecule, 
however, each helical segment is made of two complemen- 
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tary segments from different parts of the same chain, run- 
ning in opposite directions. The matching of bases to form 
^c^igatson- Crick A-U and G-C pairs, and the energeti- 
cally less favorable wobble G-U pairs defines the secondary 
structure of an RNA molecule. The problem of RNA sec- 
ondary structure prediction is then to find the map of 
optimal pairings for a given sequence of the nucleotides 
(the primary structure) |14j. At finite temperatures, one 
has to consider structures that are not necessarily opti- 
mal in energy, but are nevertheless important due to their 
configurational entropy. 

Compared to protein folding, RNA secondary struc- 
ture prediction is a simpler problem due to the satura- 
tion of base-pairing 9 . In particular, for RNA molecules 
without the so called "pseudoknots" , pairing of bases in 
an RNA molecule may be represented by one-dimensional, 
non-intersecting rainbow diagrams|15). Thanks to this topo- 
logical constraint, the partition function of a chain of N 
bases can be determined through an exact dynamic pro- 
gramming algorithm whose computational complexity scales 
as N 3 |16I17 | . Consequently, chains of length up to a few 
thousand bases can be readily investigated numerically. 

From a statistical mechanics point of view, the key is- 
sues with regard to RNA secondary structures include a 
classification of possible phases of the chain in the limit 
N — » oo, and the characteristics of the equilibrium struc- 
tures in each of these phases [T3I15| . At sufficiently high 
temperatures, it is generally agreed that the system is in 
a "molten phase" with non-specific base-pairing. Various 
statistical properties of this phase, including the distri- 
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bution of pairing distances, are known through the an- 
alytic solution to the homopolymer version of the RNA 
problem |18j. A denning property of the molten phase is 
the universal amplitude of the logarithmic excess entropy 
of a finite chain [TH]. As the temperature decreases, a new 
type of behavior, with properties typical of disordered sys- 
tems, is seen. However, many details remain controversial |19l2QI21l22l2H] 



the system at finite temperatures is discussed in Sec. 4. 
In Sec. 5 we consider other specification of the random 
pairing energy and their effect on the properties of the 
ground state. Section 6 presents a summary and our main 
conclusions. 



Several simplifying models have been introduced in 
the study of the low temperature glass phase of an RNA 
molecule. Higgs considered a random heteropolymer model 
of RNA secondary structure formation|19|. In his model, 
only Watson-Crick pairing is allowed and each such pair 
is assigned a negative energy. Through numerical simula- 
tions of random sequences, he observed that the ground 
state is highly degenerate and the system at low temper- 
atures exhibits a broad distribution of the overlap func- 
tion. The same conclusion was reached in a recent work 
by Pagnani et al. who also studied the molten-to-glass 
transition j^Uj. The existence of a spin-glass type ground 
state in such a model is however disputed by Hartmann. |21| 

Bundschuh and Hwa have recently carried out exten- 
sive analytic and numerical studies of the RNA secondary 
structure problem ^S]- They have discussed in particular 
the nature and energetics of low-energy excitations in the 
glass phase, and presented a proof for the existence of a 
finite-temperature glass transition. They have shown that 
the scaling of pairing distances in the glass phase follows a 
different power from that of the molten phase (see discus- 
sion in Sec. 3.2). In addition, the finite-size correction to 
the free energy (termed pinching energy by the authors) 
grows as a power-law of the chain length, but the exponent 
is small and nonuniversal. 

Krzakala et al.\2,2\ introduced an alternative measure 
of the sequence-specific pairing which is a characteristic of 
the low temperature glass phase. Their approach is based 
on an analogy to the directed polymer problemj^H and 
the replica method. Their conclusion on the existence of 
a finite-temperature glass transition is in agreement with 
previous work. 

While the quantities introduced by Bundschuh and 
Hwa, and by Krzakala et al. provide effective measures 
of the glassy order in the low temperature phase, there is 
yet no microscopic understanding of the origin of the scale- 
dependent energies as seen in numerical work. In partic- 
ular, there is no compelling reason why power-law forms 
are the preferred choice for the observed scale dependence. 
This question is important not only from a theoretical 
point of view, but also when considering the effect of se- 
quence mutation and environmental perturbations (such 
as tertiary contacts, pseudoknots, and magnesium ions, 
etc.) on the RNA secondary structure. Therefore a better 
characterization of the properties of the low-temperature 
phase is desirable. 

The paper is organized as follows. In Sec. 2 we in- 
troduce the random pairing energy model studied in the 
present work and briefly review the numerical scheme used 
for exact computation of ground state and finite temper- 
ature properties. Section 3 contains results and analysis 
of various properties in the ground state. The behavior of 



2 The model and dynamic programming 

The statistical mechanics of the secondary structure of 
random RNAs is reviewed in Ref.|15|. An RNA molecule 
is defined by its nucleotide sequence. A secondary struc- 
ture of the molecule is a pairing pattern of bases on the 
sequence, where each base (indexed by its position i in the 
sequence) has at most one partner. As in most previous 
studies, we consider here only secondary structures that 
obey the "noncrossing" constraint, i.e., if base i pairs with 
base j > i, and another base k > i pairs with base I > k, 
then either i < j < k < I (separated) or i < k < I < j 
(nesting). This class of structures, which are the most 
common in nature, form the configuration space of the 
RNA secondary structures considered below. 

Realistic prediction of the thermodynamically favored 
RNA secondary structures requires a large parameter set 
derived empirically from pains-taking thermodynamic mea- 
surements over the years [25! • Its main purpose is to differ- 
entiate accurately local pairing alternatives. This compli- 
cation, we believe, is not necessary for a statistical char- 
acterization of the scaling properties in the low tempera- 
ture phase and around the glass transition in the random 
sequence ensemble. Instead, we consider here a much sim- 
pler model where the energy of a secondary structure S is 
given by, 

E[S\= £ e itj , (1) 
(i,j)es 

where tij is the pairing energy of base i with base j. The 
sum is over all base pairings of S. 

To complete the description of the model, we need to 
assign values to the pairing energies ejj for a given nu- 
cleotide sequence. The standard choice is to make tij de- 
pendent on the two nucleotides involved. For the random 
sequence ensemble, an alternative approach is to choose 
eij as independent random variables, as suggested in Ref. 15 . 
This was motivated at first by analytical considerations 
and supported by numerical evidence. In fact, the two ap- 
proaches become quite identical when the alphabet size 
exceeds sequence length, as then every possible pair has 
a different combination of partners for a typical random 
sequence. Considering that, for real RNA, each helical 
segment typically contains a consecutive stack of five or 
more paired bases (with more than 4 5 — 1024 possible 
sequences on each side), one may view the second ap- 
proach as defining a coarse-grained model on the scale of 
a helical segment. Previous work on sequence alignment 
has shown that the matching energy of two randomly se- 
lected sequences follows a distribution with an exponen- 
tial tail |26!27l28l29j . Thus, as a coarse-grained model of 
RNA secondary structures in the sense described above, 
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we choose e^j < to be independent random variables 
satisfying the distribution, 

P(6)=6 1 exp(6/6 ), (2) 

where eo = 1 sets the only energy scale of the problem. 

Due to the noncrossing constraint on the pairing pat- 
terns, the partition function 

Z(JV)=£exp(-i5[S]/T) (3) 

s 

of an RNA molecule of N bases at temperature T can be 
calculated using a dynamic programming algorithm |lrill7| . 
This is done based on the recursive relation 

3-1 

Z id = Z i4 - X + Z^k-ie-^^Zk+tj.t. (4) 

k—i 

Here Z^ j denotes the partition function of a contiguous 
segment of the molecule from position i to position j. 
Starting from the shortest segments of one base each with 
Zi t i = l,i = 1, 2, . . . , N, one obtains the partition func- 
tion Z(N) = Zi n in 0(N 3 ) elementary computations. 
At T = 0, the following equation can be used instead to 
calculate the ground state energies, 

Eij = m\Ui<k<j{Ei : k-i + Ek+i,j-i + efc,j}j (5) 

where as a convention we set e^j = for all i, and Eij = 
for i > j. 



3 The ground state 

In this section we present numerical results regarding the 
ground state of an RNA molecule in the random sequence 
ensemble. Chains up to N — 2048 bases are investigated, 
with a minimum of 1000 realizations of the pairing ener- 
gies. Results for shorter chains are obtained as a byprod- 
uct in the computation. 

3.1 Ground state energy 

It has been suggested |15!22| that the statistical mechanics 
of the RNA problem may be closely related to that of a 
directed polymer in a disordered medium, which has been 
studied extensively in the past (23] ■ In the latter case, the 
ground state energy of the polymer (or its free energy at 
finite T) contains a finite-size correction which grows as a 
power of the chain length [3~H32| . This energy is of the same 
order as the disorder- induced energy fluctuations, with an 
exponent that takes a universal value throughout the low 
temperature phase. It is thus interesting to examine such 
corrections for the RNA problem as well. 

The origin of an excess energy associated with a chain 
of finite length can be appreciated with the help of Fig. 
1. Dashed lines in the figure indicate pairing of the bases. 
Cutting the chain in the middle yields two shorter chains 




A B 

(a) (b) 

Fig. 1. Rainbow diagrams illustrating allowed base pairing 
(dashed line) on (a) a single chain, and (b) two separate parts 
when the chain is broken in the middle. 



half of the original size. All pairing patterns of the two 
shorter chains can be realized on the longer chain, but 
the reverse is not true. Therefore the free energy of the 
chain increases when it is broken into smaller parts. This 
property translates directly to an excess free energy for a 
chain of finite length. 

The importance of quantifying this excess energy has 
been stressed by Bundschuh and Hwa^|. Due to the non- 
crossing constraint, when two bases i and j on the chain 
form a pair, those within the segment delimited by the 
two are only allowed to pair among themselves. Therefore 
any pairing of the bases effectively defines a finite sys- 
tem isolated from the rest of the chain. For the pairing to 
be energetically favorable, its energy e^j must offset the 
energy cost for splicing out the segment inbetween. Argu- 
ments along this line can be used to discuss the stability 
of a given state as done in Ref. JS] to construct a lower 
bound on the glass transition temperature. 

Here we examine not only the average value but also 
the distribution of the excess energy as a function of the 
chain length. Due to the statistical fluctuations in the 
bond energies, the total ground state energy E(N) of a 
chain of length N has a fluctuation proportional to N 1 / 2 . 
This background fluctuation can be eliminated using the 
construction shown in Fig. 1(b). A chain of length 2N is 
formed by joining two chains A and B, each of length N. 
Let AEn = Ei t N + Epf + i,2N — Ei,2N be the energy gained 
when bases on chain A are allowed to pair with bases on 
chain B to form the ground state of the full chain. Apart 
from the energy of a single pair, this quantity is identical 
to the pinching energy introduced in Ref. ^H] ■ Chemically, 
it can be considered as the heat of "reaction" that brings 
the two halves together. Obviously, AEn is typically pos- 
itive but may happen to be zero when the ground state of 
the full chain breaks into two independent halves. 

Figure 2(a) shows the normalized distribution P(AE, N) 
of AE N for N = 2, 4, 8, . . . , 1024 on semi-log scale. As N 
increases, the peak of each curve shifts to the right while at 
the same time its width also increases. In addition, there 
is a finite statistical weight P {N) ~ N" 4 / 3 at AE = 
[see Fig. 4(a)] Figure 2(b) shows a scaling plot of the dis- 
tributions. Here (AE N ) and W N = y/(AE%) - (AE N ) 2 
denote the mean value and standard deviation of AEn, 
respectively. Convergence to a limiting form at N = oo 
starts from the middle of each curve and gradually ex- 
tends over to the wings. Interestingly, the tail of the dis- 
tributions at large AE decays as a simple exponential. 
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Fig. 2. (a) Distribution of the excess energy AEn of a finite 
chain for N = 2, 4, ... , 1024. (b) Convergence to a limiting 
form with zero mean and unit variance. Arrows indicate the 
direction of increasing N. 
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Fig. 3. Mean value and standard deviation of AEn against 
IniV. Solid and dashed lines represent polynomial fits to the 
data. 

Figure 3 shows (AEn) and Wn against IniV. It is ev- 
ident that the two quantities are not proportional to each 
other, i.e., the distributions shown in Fig. 2 can not be 
collapsed with a single energy scale at each N. Neverthe- 
less, the data can be represented nicely by a quadratic 
function in IniV for (AEn), and a linear function in IniV 
for W. This suggests that the disorder averaged ground 
state energy can be written in the form, 

(E(N)) = e iV + a + 61niV + cln 2 iV, (6) 

where eo is the energy per base in the infinite size limit. 
From the fit we obtain a = 0.81, b = 1.28, and c = 0.26. 

Although the logarithmic form © fits the data nearly 
perfectly, a power-law dependence can not be ruled out 
based on the numerical data alone. 15 22 Previously, Bund- 
schuh and Hwa^H] made the suggestion that the logarith- 
mic size dependence is more appealing given the small 
and nonuniversal exponent obtained from various models. 



Here we show that the difference in behavior for (AEn) 
and Wn is still consistent with a single energy scale at 
each N which grows as IniV. In such a scenario, the In 2 N 
term arises naturally as (AEn) also contains contribu- 
tions from smaller scales. Specifically, with reference to 
Fig. 1(b), we may first group bases on either side of the 
break into zones that are evenly spaced on a logarithmic 
scale, according to their distance R (measured in terms of 
number of bases along the chain) to the breaking point. 
Let I — In R be the index of zone I with a width of order 
R, and suppose that the typical total energy increase of 
bases in the zone caused by the break is proportional to 
hxR, Adding up contributions up to I = IniV yields the 
desired In 2 N dependence. In comparison, fluctuations of 
AEn do not contain this cumulative effect. 

The In N energy scale may be motivated from the ex- 
treme value statistics argument developed in Refs. 15 2 6I27I28I29| . 
According to Eq. when a new base is added to the end 
of a chain of length N, optimal pairing with interior bases 
is determined by the competition between the energy cost 
for perturbing the existing ground state [i.e., the first two 
terms on the right-hand-side of Eq. (JjJJ], and the energy 
gain from the newly formed pair. The perturbation is at its 
weakest when the partner base k is located at either end of 
the chain. In such a situation, the number of bases at such 
distances is small, so the available choices for the bond en- 
ergy ek,j are rather limited. On the other hand, when k 
resides in the middle of the chain, the perturbation is the 
strongest, but then there are of order N choices for e^ j 
which, according to the extreme value statistics, yield a 
typical energy gain proportional to IniV. If AEn grows as 
a power of N, this energy gain will not be enough to offset 
the energy cost associated with the perturbation. Conse- 
quently pairing with a base in the middle of the chain 
is extremely unlikely. This however would imply that the 
chain contains almost exclusively short-distance pairs. If 
this were the case, the system would have a finite corre- 
lation length and a bounded AEn, which contradicts our 
original assumption. Self-consistency thus requires AEn 
to grow slower than any power of N, but at least as fast 
as IniV. 



3.2 Pairing statistics 

In addition to the ground state energy, we have examined 
the statistics of pairing distance d in the ground state. 
When two bases i and j > i form a pair, their pairing 
distance is defined cis dij — -j 

In fact, for a chain of 
length N, pairs of size d are equivalent to pairs of size 
N — d. This becomes evident if we join the two ends of 
the chain to form a circle, in which case distance between 
the two bases is given by the smaller of d and N — d. 
Let P s (d) be the distribution of d. Symmetry then yields 
P s {d) = P S (N — d). Figure 4(a) shows the distribution 
P s (d) for a chain of length N — 2048, averaged over 1000 
realizations of the £ij's. From the plot, we see that, apart 
from those data points near d ~ iV/2 which are influenced 
by finite-size effects, P s (d) follows a power-law function dP 
with an exponent r\ = —4/3. Also shown in the figure is 
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10 10' 

length 

Fig. 4. Pairing statistics in the ground state, (a) Distribu- 
tion Ps(d) of pairing distance d at N = 2048 (solid line). Also 
shown is Po{N) against N (circles), (b) A scaling plot of the 
distribution of the optimal break point for N = 2, 4, . . . , 1024. 
Dashed line in both figures indicates a power-law function with 
an exponent rj = —4/3. 



Pq(N), the probability that the ground state breaks into 
two independent halves as shown in Fig. 1(b), against N 
which has a similar behavior. Note that, for each N, Pq(N) 
is the same as P(AE) at AE = [cf. Fig. 2(a)]. 

We have also examined the statistics of the location of 
base k where the minimum on the right hand side of Eq. 
10) is achieved. Let R be the distance of this base to its 
partner base j. The distribution P),(R,N), with N being 
the length of the interval is shown in Fig. 4(b) using 

the scaled variables. Here N = 2, 4, 8, ... , 1024. From the 
data collapse we conclude that P\(R, N) obeys scaling, 



Pb(R,N) = N~ 4/3 <P(R/N), 



(7) 



where <P(x) ~ x 4 / 3 for iCl. 

The scaling properties of base pairing in the ground 
state as discussed above are consistent with the rough- 
ness of "mountain diagrams" introduced in Ref.|15j. In 
the latter representation, a given secondary structure is 
mapped to a height profile following a simple rule: start- 
ing from one end of the chain, say i = with Hq = 0, one 
proceeds successively to the right, setting /i, = h%-i + 1 
{hi = h%— i — 1) if base i is paired with base j > i (j < i), 
and hi = if base i is unpaired. Bundschuh and Hwa 
have shown that the average value of hi as defined above 
grows as a power-law of the chain length N, h ~ 
where the "roughness exponent" £ = ( g = 0.67 ± 0.02, 
considerably larger than its value (o = 1/2 in the molten 
phase. As shown in Ref.jSS], the two exponents ( and rj 
satisfy a general scaling relation, 



4 Finite temperature properties and the glass 
transition 

At finite temperatures, one needs to consider the entropy 
associated with alternative pairing to determine the equi- 
librium structure of an RNA molecule. Comparing Eq. 

with Eq. ©, we see that qualitatively two types of 
behavior can be distinguished: (i) only one or a few terms 
on the right hand side of (TJJ contribute to Zij, in which 
case the situation is similar to that of the ground state; 
(ii) the number of terms that contribute significantly to 
Zij grows with the chain length, in which case pairing 
becomes non-specific and one is in the molten phase. 

As a quantitative criterion that differentiates the two 
situations, Bundschuh and Hwa|15| proposed to examine 
the size dependence of the free energy cost for imposing a 
pairing (termed "pinching" ) . At sufficiently high temper- 
atures, the pinching free energy AF grows with the pair 
size N as |TlniV, and hence is purely entropic. Based 
on an estimate of the energy gain for the best matched 
pair forbidden by the pinch, Bundschuh and Hwa argued 
that this behavior cannot continue below a certain tem- 
perature, and hence a glass transition is expected to take 
place. Therefore the size-dependence of AF can be used 
to locate the phase transition point. 

Following this line of thinking, we consider the statis- 
tics of AFn = Thi(Zi,2N /Zi,nZn+i,2n) which is the fi- 
nite temperature analog of AEjy defined in the previous 
section. Figure 5(a) shows the mean value of AFn against 
InN, with the high temperature behavior^)] (3/2)Tln N 
subtracted from the data. For T = 1.25 and below, there 
is a clear upward curvature in each data set, indicating 
presence of a In 2 N term, though its amplitude decreases 
with increasing temperature. At T = 1.5 and 1.75, how- 
ever, deviations from the expected high temperature be- 
havior is weak. Figure 5(b) shows the standard deviation 
W f ,n = V( Af n) ~ {AF N } 2 against InN. Using data 
points at large N, we extracted the slope A(T) of each 
curve and plotted the result against T as in the inset. The 
result can be summarized as, 



W F . 



N 



( A(T)hxN + B(T), T <T g 
\B(T), T > T g 



(9) 



Here A(T) = A (T - T g f, with T g ~ 1.7. 

The simple functional forms which fit well the numeri- 
cal data strongly suggest that there is an underlying sim- 
ple mathematical structure. It is quite conceivable that 
a renormalization group theory, similar to the one intro- 
duced in Ref. |27| for the unbinding transition of two het- 
eropolymers, can be devised. In the absence of such a the- 
ory, Eq. @ should merely be considered as a convenient 
representation of numerical data. 



(8) 



Equation JHJ holds both in the ground state and in the 
molten phase, where ?7o = —3/2 has been calculated ex- 
actly. 



5 Other models for the pairing energy 

We have argued in Sec. 2 that Eq. (J2J provides a generic 
description of the distribution of pairing energies on a 
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Fig. 5. (a) Mean and (b) standard deviation of AFm against 
In at a set of temperatures below and above the glass tran- 
sition. Inset in (b) shows the slope of each curve (and addi- 
tional ones not shown) against temperature. Dashed line is a 
quadratic fit with T g = 1.7. 



coarse-grained scale. To verify this hypothesis, and to find 
out to what extent the scaling properties obtained un- 
der @ remain universal, we consider in this section other 
forms of the pairing energy, and carry out a comparative 
study of their ground state properties. 
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5.1 A sequence-based model 

To get a flavor of the similarities and differences between 
random sequence models (with N random variables) and 
random pairing energy models (with iV 2 random vari- 
ables), we consider here a simplified four- nucleotide model 
incorporating the essential features of base-pairing energetics 
In addtion to the Watson-Crick A-U and G-C pairs, we 
allow the less favorable G-U pair. A stacking energy is 
included for the propagation of short helices, i.e., if two 
consecutive bases i and i + 1 pair with j and j — 1, re- 
spectively, an additional energy E s is gained. The mini- 
mal hairpin loop length is set at 4 nt. Results presented 
below are for the following choice of energy parameters: 
E s = —3;Eqc — —3—E s ;Eau = —2—E s ;Eq\j = —1—E S . 
With this specification, isolated pairings are disfavored. In 
the ground state, the typical length of a helix is about five 
base pairs for a random sequence. 

Figure 6 shows the mean value and standard deviation 
of AEm against In N for the random sequence ensemble. 
The behavior is very similar to that of Fig. 3. We have also 



Fig. 6. Mean value and standard deviation of the excess energy 
of a finite chain in the random sequence ensemble. 



examined the statistics of the pairing distances whose dis- 
tribution fits well to the scaling form Q) with r/ = —4/3. 
These and other properties of the ground state will be 
reported in detail elsewhere. 



5.2 Power-law distribution of the pairing energy 

As we mentioned above, the logarithmic size dependence 
of pairing energies is a generic feature of matching statis- 
tics for random sequences. Through evolution, however, 
sequences that lead to more stable structures may be se- 
lected for functional advantages, including possibly RNA's 
with longer matched segments. Indeed, the secondary struc- 
ture of many real RNA's show extended stretches of du- 
plices which are not expected of a random sequence. This 
observation motivates us to examine the ground state en- 
ergetics and pairing pattern under a power-law distribu- 
tion of the pairing energies, 



P(e) 



6 < -1. 



(10) 



Figure 7 shows the mean and standard deviation of 
'■n against N for a = 2, 3, and 4. At sufficiently large 
N, the two quantities become proportional to each other, 
indicating a single energy scale AEm ~ N u . The exponent 
ui can be related to a from the following consideration. On 
a chain of length N, there are N(N — l)/2 possible pair- 
ings. The lowest pairing energy e m j n is determined by the 
condition N 2 \e m i n \~ a ~ 1. Hence £,„;„ ~ —N 2 / a . Assum- 
ing the energy cost for breaking a chain into two halves is 
dominated by e m j n of the strongest bond, we obtain, 



2/a, 



(11) 



which agrees well with the numerical data. 

We have also investigated the distribution of the pair- 
ing distance under Eq. 1)10(1 . Interestingly, the results are 
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Fig. 7. Power-law distribution of the pairing energies: a — 2 
(diamond), a — 3 (square), and a = 4 (circle). Open symbols 
represent the mean value of AEn while filled symbols repre- 
sent its standard deviation. Solid, dashed, and dot-dashed lines 
indicate corresponding power-laws. 
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Fig. 8. Scaling plot of the distribution of the optimal break 
point along a chain of length N for the power-law pairing en- 
ergy model. 

quite insensitive to the value of a, and the exponent r\ is 
unchanged from its value —4/3 under (J2J. Figure 8 shows 
a representative case at a = 2. The result is nearly identi- 
cal to Fig. 4(b). The good data collapse confirms validity 
of Eq. Q for power-law pairing energies. 



6 Summary and conclusions 

In this paper we have investigated properties of the ground 
state of RNA secondary structure models, and the transi- 
tion to the molten phase at a finite temperature. We have 
focused our attention on the excess energy AEn (and the 



similarly defined excess free energy AFjy) of a chain of 
A" nucleotides due to the presence of boundaries. Since 
any pairing of two bases automatically splices out a finite 
segment of the chain, this excess energy defines the char- 
acteristic energy scale for the competition between base 
pairing on a given length scale (measured by the num- 
ber of nucleotides inbetween) and the adjustments in the 
secondary structure necessary in order to accommodate 
the pairing. From numerical investigations of a random 
energy model with exponential distribution of pairing en- 
ergies, and a random sequence model with more realistic 
base pairing and base stacking energies, we have estab- 
lished that AFm has a fluctuation proportional to In A" in 
the entire low temperature glass phase. The mean value 
of AFn, on the other hand, acquires a In 2 N term due to 
accumulation of contributions from smaller scales. 

As temperature increases towards the transition, we 
observe that fluctuations of AFn, or equivalently, varia- 
tion of the free energy cost to accommodate an inserted 
pair [i.e., the relative strength of different terms in the 
sum in Eq. Q] decreases, hence base pairing becomes less 
specific. As T — > T g , the amplitude of the In AT term van- 
ishes quadratically with the distance to the critical point. 
This behavior is in striking resemblance to the glass tran- 
sition of an elastic manifold in two dimensions subject to 
a random, uncorrelated potential [25I23]- It would be in- 
teresting to quantify this connection mathematically. 

We have also studied scaling properties in the ground 
state under a power-law distribution of the pairing en- 
ergies eij. Such distributions may be encountered in a 
coarse-grained description of real RNA molecules with se- 
quence design 35 36 . In this case, AEjq competes with the 
strongest bond on the chain. Based on extremal statistics 
arguments, we were able to express the exponent ui char- 
acterizing the power-law growth of AEn with N in terms 
of the exponent a for the power-law distribution of e^.j. 
This relation is verified by numerical data. 

Geometrical properties of base pairing in the RNA sec- 
ondary structure can be characterized with the distribu- 
tion of pairing distances. Our studies of the ground state 
shows that this distribution is well described by a power- 
law decreasing function with an exponent r\ = —4/3, in 
agreement with previous findings 15 . This behavior is sur- 
prisingly insensitive to the models used for the bond en- 
ergies. In the molten phase, however, it takes the value 
770 = —3/2. It would be desirable to find an analytic foun- 
dation for these observations. 

LHT would like to thank Ralf Bundschuh and Terry Hwa for 
many interesting discussions on the problem. Research is sup- 
ported in part by the Research Grants Council of the Hong 
Kong SAR under grant HKBU 2061/01P. Computations were 
carried out at HKBU's High Performance Cluster Computing 
Centre Supported by Dell and Intel. We gratefully acknowledge 
the support of the BIOSUPPORT project jhttp: / /bioinfo.hku.hk) 
for providing bioinformatics resources and computational ser- 
vices from the HKU Computer Centre. 

Note added: after submission of the paper we became 
aware of a manuscript by M. Lassig and K. J. Wiese|3"?] 
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where a field-theoretic renormalization group treatment 
of the glass transition is presented. The analysis has been 
further refined 3^ and yielded a value ( g ~ 0.64 at the 
glass transition. 
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